flowchart LR A[1. File] --> B[2. New File] B --> C[3. R script]
Tutoial - Part 1 TutoRial - Part 1
Marine Ecosystem Dynamics - 2024
New script
As seen during the presentation, we will keep track of our progress. We thus need to open a new script following one of the option below.
flowchart LR A["⌘/Ctlr + ⇧ + N"]
syntax R syntax
is a programming language that use a simplified syntax. In this section, we will explore how to write a script and execute it. R is a programming language that use a simplified syntax. In this section, we will explore how to write a script and execute it.
But first some syntax information:
- Everything after
#is considered as a comment and will not be executed. It is very important to write what we are doing, so we do not get lost next time we open our scripts.
# 2 + 2 will not work because of the #
2 + 2 # We should then annotate our script like this
#> [1] 4- Several lines of code can be written in one line but must be separated by a semicolon
2 + 2
#> [1] 4
3 * 2
#> [1] 6
# This can also be written as follow:
2 + 2 ; 3 * 2
#> [1] 4
#> [1] 6- In we can name any object using
=,<-,->orassignIn R we can name any object using=,<-,->orassign
c(1, 2, 3, 4) -> my_first_vector
my_vector <- c(1, 2, 3, 4)
my_function = function(x){x + 2}
assign("x", c(2, 3, 4, 5))==is a logical function that can be translated as is equal to, contrarily is not equal to is written!=
2 + 2 == 4
#> [1] TRUE
3 * 2 == 4
#> [1] FALSE
3 * 2 != 4
#> [1] TRUEExercises
Using a new script, do these calculations: Using a new R script, do these calculations:
- \ 2^7
2^7
#> [1] 128- \ cos(\pi)
?cos()
?pi()cos(pi)
#> [1] -1- The sum of all number from 1 to 100
Operations can take place for an entire vector
vector <- seq(from = 1, to = 100, by = 1) # Create a vector from 1 to 100
sum(vector) # Calculate the sum
#> [1] 5050Create a parameter x1 that equals to 5 and a parameter x2 that equals to 10
x1 <- 5 ; x2 <- 10- Is \ 2*
x1equal tox2?
2 * x1 == x2
#> [1] TRUEFunctions
As seen during the lecture, works with functions that can: As seen during the lecture, R works with functions that can:
- Already be implemented in base Already be implemented in base R
- Comming from another package
- Created by the user
We will see these three examples in this section, but first it is important to remember that the typical structure of a function is function(argument1, ...).
Fortunately helps us to remember what are the needed arguments: Fortunately R helps us to remember what are the needed arguments:
- Using
help()or?
help(topic = "sin")
?sin- Using
example
example(sum)
#>
#> sum> ## Pass a vector to sum, and it will add the elements together.
#> sum> sum(1:5)
#> [1] 15
#>
#> sum> ## Pass several numbers to sum, and it also adds the elements.
#> sum> sum(1, 2, 3, 4, 5)
#> [1] 15
#>
#> sum> ## In fact, you can pass vectors into several arguments, and everything gets added.
#> sum> sum(1:2, 3:5)
#> [1] 15
#>
#> sum> ## If there are missing values, the sum is unknown, i.e., also missing, ....
#> sum> sum(1:5, NA)
#> [1] NA
#>
#> sum> ## ... unless we exclude missing values explicitly:
#> sum> sum(1:5, NA, na.rm = TRUE)
#> [1] 15For the functions that comes from external packages, we first need to install the new packages. The most common way to do so is by executing install.packages("Package_Name"). Then when we want to load the functions, we start the script by executing library(Package_Name).
Finally, if we really do not find a suitable function in a package, we can create your functions following this general structure, but this will not be covered in this tutorial:
my_function <- function(<argument1>, <argument2>, ...){
<here comes the definition of my function>
return(<output of the definition>)
}Exercises
- What is the function
log()doing and from were does this function come from (base , other packages)? What is the functionlog()doing and from were does this function come from (base R, other packages)?
?log() #It takes the natural logarithm of the value, it comes from base R
log(10) - What are the mandatory arguments for the function
plot()
?plot() # the coordinates points x and y are needed- Is there help associated with the functions from a loaded package?
The function ggplot() comes from the package ggplot2
library(ggplot2)
?ggplot # Yes, there is also help for the imported functionsVectors
works with vector from which we can do our calculations. R works with vector from which we can do our calculations. Several ways exist to create a vector:
- Using
c(), values are added next to each other and separated with a,.
c(1, 2, 1, 4) # It works with integers (round numbers)
c(1.1, 2.4, 3.14652) # It works with floats (decimal numbers)
c("chocolate", "ice-cream") # It works with character
c(TRUE, FALSE) # It works with logical variables- Using
rep()to repeat the same values several times.
rep(x = 3, 2) # it reads: repeat 2 times the value x that is equal to 3
rep(x = "chocolate", 3) # it reads: repeat 3 times the value x that is equal to "chocolate"- Using
seq()to create a sequence of values. It only works for numeric values!
seq(from = 0, to = 10, by = 2) # it reads: create a sequence of values from 0 to 10 every 2 numbers
seq(from = -1, to = 1, by = 0.2) # it also works with negative values and decimal- Combining all of the above
rep(x = c(seq(from = 2, to = 3, by = 0.2), 5), 2)
c(rep(x = "character", 5), "other character")
c(seq(from = 2, to = 10, by = 2), rep(x = 1000, 2), c(1, 4, 2))Exercises
- Create a vector
v1that contains the values 1, 2, 3, 4, 6
v1 <- c(1, 2, 3, 4, 6)- Create a vector
v2that contains 10 times the values 1, 2, 3, 4, 6
v2 <- rep(v1, 10)- Create a vector
v3that repeatsTRUE,FALSE2 times
v3 <- rep(c(TRUE, FALSE), 2)- Create a vector
v4that goes from 10 to 2000
v4 <- 10:2000
# or
v4 <- seq(from = 10, to = 2000, by = 1)- Create a vector
v5that containsv1,v2,v3and 2 timesv4
v5 <- c(v1, v2, v3, rep(v4, 2))Dataframe
Most likely, we will work with data stored in dataframes. A dataframe is composed of observations (rows) and variables (columns). We can see a dataframe like multiples vectors put togethers.
For example in the dataframe below (named df) is composed of 4 vectors:
Speciesthat contains the species namesAbundancethat contains the abundances of the speciesLocationthat contains the location of the speciesDatethat contains the sampling date
#> Species Abundance Location Date
#> 1 Acartia 34 Askö 03-09-2024
#> 2 Pseudocalanus 12 Askö 04-09-2024
#> 3 Centropages 17 Askö 02-09-2024
We can access the individual columns (i.e., vectors) using $
df$Species
#> [1] "Acartia" "Pseudocalanus" "Centropages"
df$Abundance
#> [1] 34 12 17
df$Location
#> [1] "Askö" "Askö" "Askö"
df$Date
#> [1] "03-09-2024" "04-09-2024" "02-09-2024"Exercises
- Create a vector
genuscontaining the character"Acartia", "Centropages", "Temora", "Acartia", "Centropages", "Temora"
genus = c("Acartia", "Centropages", "Temora", "Acartia", "Centropages", "Temora")
# or genus = rep(c("Acartia", "Centropages", "Temora"), 2)- Create a vector
stationcontaining the character"Askö", "Askö", "Askö", "Tjarnö", "Tjarnö", "Tjarnö"
station = c(rep("Askö",3),rep("Tjarnö", 3))- Create a vector
abundancecontaining the values3, 10.2, 4, 2.3, 4, 9.4
abundance = c(3, 10.2, 4, 2.3, 4, 9.4)- Combine all the vectors in a dataframe called
df
df <- data.frame("Genus" = genus,
"Station" = station,
"Abundance" = abundance)- Create a vector
outputthat correspond to the columnAbundanceof the dataframedf. Isoutputsimilar to the vectorabundance?
output <- df$Abundance # or df[[3]]
output == abundance
#> [1] TRUE TRUE TRUE TRUE TRUE TRUEImporting data in Importing data in R
More often we enter our data in spreadsheets. We then need to import our data in R to process them.
To do so, we use the read.* function family.
More often we enter our data in spreadsheets. We then need to import our data in to process them.
To do so, we use the read.* function family.
A typical data import protocol looks like this:
- Set the working directory with its absolute path
setwd("/Absolute/Path/To/Working/Directory")- Import your dataset in your environment
df <- read.csv("./Relative/Path/Dataset.csv")- Examine the structure of the data to see if the importation worked well
str(df)
head(df)
tail(df)Exercises
- Import the dataset in your environment
df <- read.csv("./assets/zooplankton_seasonality.csv")- How many rows and columns does this dataset contain?
The structure of the dataset shows that there is 7 variables (columns) and 2956 observations (rows)
str(df)
#> 'data.frame': 2956 obs. of 7 variables:
#> $ Month_abb : chr "Jan" "Jan" "Jan" "Jan" ...
#> $ Year : int 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 ...
#> $ Station : chr "BY15" "BY31" "BY5" "BY15" ...
#> $ Coordinates: chr "20.05000/57.33333" "18.23333/58.58812" "15.98333/55.25000" "20.05000/57.33333" ...
#> $ Group : chr "Copepoda" "Copepoda" "Copepoda" "Copepoda" ...
#> $ Taxa : chr "Acartia" "Acartia" "Acartia" "Centropages" ...
#> $ Biomass : num 6.65 1.82 5.56 5.74 1.23 ...- What are the headers of the columns?
Both the stucture and the head show that the headers are: Month_abb, Year, Station, Coordinates, Group, Taxa, Biomass
head(df)
#> Month_abb Year Station Coordinates Group Taxa Biomass
#> 1 Jan 2009 BY15 20.05000/57.33333 Copepoda Acartia 6.650319
#> 2 Jan 2009 BY31 18.23333/58.58812 Copepoda Acartia 1.816994
#> 3 Jan 2009 BY5 15.98333/55.25000 Copepoda Acartia 5.562097
#> 4 Jan 2009 BY15 20.05000/57.33333 Copepoda Centropages 5.738561
#> 5 Jan 2009 BY31 18.23333/58.58812 Copepoda Centropages 1.228759
#> 6 Jan 2009 BY5 15.98333/55.25000 Copepoda Centropages 14.405224- What is the last row?
To see the last row, use the tail function
tail(df)
#> Month_abb Year Station Coordinates Group Taxa Biomass
#> 2951 Dec 2021 BY15 20.05000/57.33333 Copepoda Temora 32.2266648
#> 2952 Dec 2021 BY31 18.23333/58.58812 Copepoda Temora 7.6000062
#> 2953 Dec 2021 BY5 15.98333/55.25000 Copepoda Temora 23.0666650
#> 2954 Dec 2021 BY15 20.05000/57.33333 Rotatoria Synchaeta 1.0400010
#> 2955 Dec 2021 BY31 18.23333/58.58812 Rotatoria Synchaeta 0.0800001
#> 2956 Dec 2021 BY5 15.98333/55.25000 Rotatoria Synchaeta 1.2900000